STARS - 2017
Overall Objectives
Bilateral Contracts and Grants with Industry
Overall Objectives
Bilateral Contracts and Grants with Industry

Section: New Results

Recognizing Human Actions Using RGB Sport Videos From the Web

Participants : Amir Nazemi, François Brémond.

keywords: Action Recognition, Activity Recognition, Video Summarization, Web Sport Videos, Golf Videos.

The aim of this work is to extract sport actions from a web sport streaming video and use them for highlight detection. The sport videos which is used in this research is Golf videos. The report explains 4 steps including the data preparation, methods selection and excremental results.

Figure 23. The output of human poses detection on one frame of Golf video dataset

Data Preparation

Table 8. The Golf dataset.
Class names Number of samples
Tee shot + Geometrical Features 73
Putt 70
standing 81
Table 9. The experimental results of performing two different methods on the golf dataset.
Methods Accuracy on Golf Dataset
LSTM + Geometrical Features 91.5 %
P-CNN 97.32 %

First, from a streaming video a dataset is built. This dataset contains 3 action classes such as Tee-shot, Putt and Standing. Table 8 shows the dataset description.


After preparing the dataset next step is to define the solutions for the problem. Since one of the main goal of this research is to provide a general solution for sport video then we proposed a solution based on the skeleton or human poses. Our proposed framework contains human pose detection, human tracking and action recognition respectively. For human pose detection we used a recent method named open-pose [105]. For human pose tracking we used a tracking method of Inria STARS SUP framework. Finally for action recognition we did some experiments for choosing the best method.

Methods selection

From different methods in the field of action recognition we selected the P-CNN [55] method which is the state of the art on some data-set. Additionally for having an alternative solution which is faster than P-CNN we proposed a method based on geometrical features of human poses. We used the geometrical features in a Long Short-Term Memory (LSTM) structure to characterize the second solution.

Experimental Results

Table 9 shows the results of selected methods on the prepared golf dataset. As it is illustrated in the table 9 the P-CNN method works better than a method with LSTM and geometrical features.